grpc: set MaxConcurrentStreams to avoid sudden traffic spikes that lead to PD OOM by okJiang · Pull Request #8977 · tikv/pd

okJiang · 2025-01-06T07:58:43Z

What problem does this PR solve?

Issue Number: Close #8882, ref #4480

What is changed and how does it work?

Added MaxConcurrentStreams to limit the request concurrency. This is a self-protection mechanism of PD. Once the number of concurrent requests reaches the limit, gRPC will wait for the ongoing requests to finish and allocate resources to the waiting requests. Therefore, in this situation, the request time may slow down, but it will provide better robustness.

PS: This parameter cannot be modified during runtime, so we cannot support changing it by pd-ctl. If you think it should be configurable, please comment.

Check List

Tests

Unit test
Integration test

Release note

None.

Signed-off-by: okJiang <819421878@qq.com>

codecov · 2025-01-06T08:17:20Z

Codecov Report

❌ Patch coverage is 57.14286% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.61%. Comparing base (41919ad) to head (f9a16f4).
⚠️ Report is 545 commits behind head on master.

❌ Your patch check has failed because the patch coverage (57.14%) is below the target coverage (74.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #8977      +/-   ##
==========================================
+ Coverage   76.31%   77.61%   +1.30%     
==========================================
  Files         465      532      +67     
  Lines       70547    94134   +23587     
==========================================
+ Hits        53839    73065   +19226     
- Misses      13361    17187    +3826     
- Partials     3347     3882     +535

Flag	Coverage Δ
unittests	`77.61% <57.14%> (∅)`

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

rleungx · 2025-01-06T08:20:33Z

 	defaultGCTunerThreshold           = 0.6
 	minGCTunerThreshold               = 0
 	maxGCTunerThreshold               = 0.9
+	// If concurrentStreams reaches 600k, the memory usage is about 40GB. To


Is there any tests to support the conclusion?

This conclusion comes from a practical case where the cluster contains millions of regions. And its requests are ScanRegions mainly.

CMIIW, it only limits the concurrency for one connection, not total concurrency on the server side.

In simple terms, should we consider this sudden surge in traffic as an anomaly? If so, I think we can set this parameter to protect PD. Do you know in what normal circumstances PD would experience such high traffic? For example, 16GB PD and 160k requests.

https://github.com/etcd-io/etcd/blob/fce823ac2830033270f8fe03fa1b56e62bf882b8/server/embed/config.go#L230-L232

// MaxConcurrentStreams specifies the maximum number of concurrent
// streams that each client can open at a time.

If users make massive requests using multiple clients, we have no way to limit it...

Is it applicable to unary and stream?

Yes. This is a gRPC/HTTP2 transport-level limit, so it applies to both unary RPCs and streaming RPCs.

Unary: each in-flight unary call occupies one stream until the call finishes.

Stream: each open client/server/bidi stream occupies one stream for its whole lifetime.

Scope: it limits concurrent streams per client connection (server transport), not the total concurrency across all clients.

So it can help on both paths, but for streaming it only limits the number of open streams, not the number of messages within one stream. I think this thread is about clarifying the behavior, so no code change is needed here.

Rechecked: yes, it applies to both unary RPCs and streaming RPCs, because this limit is enforced at the gRPC/HTTP2 stream layer per client connection.

Unary RPC: one in-flight call uses one stream until it completes.

Streaming RPC: one open client/server/bidi stream holds one stream for its lifetime.

It does not limit the total concurrency across multiple client connections.

So the knob is effective for both unary and stream requests, but for streaming it limits the number of open streams rather than the number of messages sent on one stream. No code change is needed for this clarification.

okJiang · 2025-02-08T09:26:24Z

No more update and comment, close it now.

ti-chi-bot · 2025-08-07T02:19:24Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign benmeadowcroft for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

server/config/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

okJiang · 2025-08-07T02:20:18Z

/cc @lhy1024

set MaxConcurrentStreams

f9a16f4

Signed-off-by: okJiang <819421878@qq.com>

ti-chi-bot Bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 6, 2025

rleungx reviewed Jan 6, 2025

View reviewed changes

okJiang closed this Feb 8, 2025

okJiang reopened this Aug 7, 2025

ti-chi-bot Bot requested a review from lhy1024 August 7, 2025 02:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grpc: set MaxConcurrentStreams to avoid sudden traffic spikes that lead to PD OOM#8977

grpc: set MaxConcurrentStreams to avoid sudden traffic spikes that lead to PD OOM#8977
okJiang wants to merge 1 commit intotikv:masterfrom
okJiang:set-maxconcurrency

okJiang commented Jan 6, 2025 •

edited

Loading

Uh oh!

codecov Bot commented Jan 6, 2025 •

edited

Loading

Uh oh!

rleungx Jan 6, 2025

Uh oh!

okJiang Jan 6, 2025 •

edited

Loading

Uh oh!

rleungx Jan 6, 2025

Uh oh!

okJiang Jan 6, 2025

Uh oh!

okJiang Jan 6, 2025

Uh oh!

lhy1024 Aug 7, 2025

Uh oh!

okJiang Mar 20, 2026

Uh oh!

okJiang Mar 20, 2026

Uh oh!

okJiang commented Feb 8, 2025

Uh oh!

ti-chi-bot Bot commented Aug 7, 2025

Uh oh!

okJiang commented Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

okJiang commented Jan 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how does it work?

Check List

Release note

Uh oh!

codecov Bot commented Jan 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

rleungx Jan 6, 2025

Choose a reason for hiding this comment

Uh oh!

okJiang Jan 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rleungx Jan 6, 2025

Choose a reason for hiding this comment

Uh oh!

okJiang Jan 6, 2025

Choose a reason for hiding this comment

Uh oh!

okJiang Jan 6, 2025

Choose a reason for hiding this comment

Uh oh!

lhy1024 Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

okJiang Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

okJiang Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

okJiang commented Feb 8, 2025

Uh oh!

ti-chi-bot Bot commented Aug 7, 2025

Uh oh!

okJiang commented Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

okJiang commented Jan 6, 2025 •

edited

Loading

codecov Bot commented Jan 6, 2025 •

edited

Loading

okJiang Jan 6, 2025 •

edited

Loading